Home Download Help Forum Resources Extensions FAQ NetLogo Publications Contact Us Donate Models: Library Community Modeling Commons Beginners Interactive NetLogo Dictionary (BIND) NetLogo Dictionary User Manuals: Web Printable Chinese Czech Farsi / Persian Japanese Spanish
|
NetLogo User Community Models(back to the NetLogo User Community Models)
WHAT IS IT?
This model implements Q-learning (Watkins 1989) a one-step temporal difference algorithm in the area of reinforcement learning, a branch of artificial intelligence and machine learning.
HOW IT WORKS
The agent (ant) moves to a high value patch, receives a reward, and updates the previous patches learned values with the received reward using the following algorithm:
Q(s,a) = Q(s,a) + step-size * [reward + discount * max(Q(s’,a’)) – Q(s,a)]
The agent keeps moving until it hits a blue patch with a -10pts reward or the goal patch with +10pts reward, which results in a new episode and resetting of the agent to the starting position.
HOW TO USE IT
The buttons and sliders control the setup and all the parameters inside the algorithm. The graph provides the average reward on obtained per episode. The step-size parameter is the amount old values are updated towards new values. Discount is the present value worth of future rewards. Exploration-% is the amount moves the agent takes towards a non-optimum patch, which can help the agent explore more of the maze and not get stuck in local optimums. THINGS TO NOTICE
The average reward in the graph increases over the number of episodes that the agent has trained on, which shows the learning process of the agent.
THINGS TO TRY
Experiment with the algorithm parameters such as step-size, discount, and exploration-%.
EXTENFDING THE MODEL
Implement different reward schemes allowing more direct and optimal paths, such as -1pts for every move the agent makes forcing the agent to find a more direct approach to the goal square.
CREDITS AND REFERENCES
Written by Joe Roop (Spring 2006): Joseph.Roop@asdl.gatech.edu
References:
|
(back to the NetLogo User Community Models)